Goto

Collaborating Authors

 Far Eastern Federal District


From Style to Facts: Mapping the Boundaries of Knowledge Injection with Finetuning

arXiv.org Artificial Intelligence

Finetuning provides a scalable and cost-effective means of customizing language models for specific tasks or response styles, with greater reliability than prompting or in-context learning. In contrast, the conventional wisdom is that injecting knowledge via finetuning results in brittle performance and poor generalization. We argue that the dichotomy of "task customization" (e.g., instruction tuning) and "knowledge injection" (e.g., teaching new facts) is a distinction without a difference. We instead identify concrete factors that explain the heterogeneous effectiveness observed with finetuning. To this end, we conduct a large-scale experimental study of finetuning the frontier Gemini v1.5 model family on a spectrum of datasets that are artificially engineered to interpolate between the strengths and failure modes of finetuning. Our findings indicate that question-answer training data formats provide much stronger knowledge generalization than document/article-style training data, numerical information can be harder for finetuning to retain than categorical information, and models struggle to apply finetuned knowledge during multi-step reasoning even when trained on similar examples -- all factors that render "knowledge injection" to be especially difficult, even after controlling for considerations like data augmentation and information volume. On the other hand, our findings also indicate that it is not fundamentally more difficult to finetune information about a real-world event than information about what a model's writing style should be.


Evaluate Summarization in Fine-Granularity: Auto Evaluation with LLM

arXiv.org Artificial Intelligence

Due to the exponential growth of information and the need for efficient information consumption the task of summarization has gained paramount importance. Evaluating summarization accurately and objectively presents significant challenges, particularly when dealing with long and unstructured texts rich in content. Existing methods, such as ROUGE (Lin, 2004) and embedding similarities, often yield scores that have low correlation with human judgements and are also not intuitively understandable, making it difficult to gauge the true quality of the summaries. LLMs can mimic human in giving subjective reviews but subjective scores are hard to interpret and justify. They can be easily manipulated by altering the models and the tones of the prompts. In this paper, we introduce a novel evaluation methodology and tooling designed to address these challenges, providing a more comprehensive, accurate and interpretable assessment of summarization outputs. Our method (SumAutoEval) proposes and evaluates metrics at varying granularity levels, giving objective scores on 4 key dimensions such as completeness, correctness, Alignment and readability. We empirically demonstrate, that SumAutoEval enhances the understanding of output quality with better human correlation.


Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps

arXiv.org Artificial Intelligence

Real-time detection and prediction of extreme weather protect human lives and infrastructure. Traditional methods rely on numerical threshold setting and manual interpretation of weather heatmaps with Geographic Information Systems (GIS), which can be slow and error-prone. Our research redefines Extreme Weather Events Detection (EWED) by framing it as a Visual Question Answering (VQA) problem, thereby introducing a more precise and automated solution. Leveraging Vision-Language Models (VLM) to simultaneously process visual and textual data, we offer an effective aid to enhance the analysis process of weather heatmaps. Our initial assessment of general-purpose VLMs (e.g., GPT-4-Vision) on EWED revealed poor performance, characterized by low accuracy and frequent hallucinations due to inadequate color differentiation and insufficient meteorological knowledge. To address these challenges, we introduce ClimateIQA, the first meteorological VQA dataset, which includes 8,760 wind gust heatmaps and 254,040 question-answer pairs covering four question types, both generated from the latest climate reanalysis data. We also propose Sparse Position and Outline Tracking (SPOT), an innovative technique that leverages OpenCV and K-Means clustering to capture and depict color contours in heatmaps, providing ClimateIQA with more accurate color spatial location information. Finally, we present Climate-Zoo, the first meteorological VLM collection, which adapts VLMs to meteorological applications using the ClimateIQA dataset. Experiment results demonstrate that models from Climate-Zoo substantially outperform state-of-the-art general VLMs, achieving an accuracy increase from 0% to over 90% in EWED verification. The datasets and models in this study are publicly available for future climate science research: https://github.com/AlexJJJChen/Climate-Zoo.


Early Detection of Bark Beetle Attack Using Remote Sensing and Machine Learning: A Review

arXiv.org Artificial Intelligence

This paper provides a comprehensive review of past and current advances in the early detection of bark beetle-induced tree mortality from three primary perspectives: bark beetle & host interactions, RS, and ML/DL. In contrast to prior efforts, this review encompasses all RS systems and emphasizes ML/DL methods to investigate their strengths and weaknesses. We parse existing literature based on multi- or hyper-spectral analyses and distill their knowledge based on: bark beetle species & attack phases with a primary emphasis on early stages of attacks, host trees, study regions, RS platforms & sensors, spectral/spatial/temporal resolutions, spectral signatures, spectral vegetation indices (SVIs), ML approaches, learning schemes, task categories, models, algorithms, classes/clusters, features, and DL networks & architectures. Although DL-based methods and the random forest (RF) algorithm showed promising results, highlighting their potential to detect subtle changes across visible, thermal, and short-wave infrared (SWIR) spectral regions, they still have limited effectiveness and high uncertainties. To inspire novel solutions to these shortcomings, we delve into the principal challenges & opportunities from different perspectives, enabling a deeper understanding of the current state of research and guiding future research directions.


NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference Checklist

arXiv.org Artificial Intelligence

In this study, we analyze automatic evaluation metrics for Natural Language Generation (NLG), specifically task-agnostic metrics and human-aligned metrics. Task-agnostic metrics, such as Perplexity, BLEU, BERTScore, are cost-effective and highly adaptable to diverse NLG tasks, yet they have a weak correlation with human. Human-aligned metrics (CTC, CtrlEval, UniEval) improves correlation level by incorporating desirable human-like qualities as training objective. However, their effectiveness at discerning system-level performance and quality of system outputs remain unclear. We present metric preference checklist as a framework to assess the effectiveness of automatic metrics in three NLG tasks: Text Summarization, Dialogue Response Generation, and Controlled Generation. Our proposed framework provides access: (i) for verifying whether automatic metrics are faithful to human preference, regardless of their correlation level to human; and (ii) for inspecting the strengths and limitations of NLG systems via pairwise evaluation. We show that automatic metrics provide a better guidance than human on discriminating system-level performance in Text Summarization and Controlled Generation tasks. We also show that multi-aspect human-aligned metric (UniEval) is not necessarily dominant over single-aspect human-aligned metrics (CTC, CtrlEval) and task-agnostic metrics (BLEU, BERTScore), particularly in Controlled Generation tasks.


US says it is shuttering last 2 consulates in Russia

FOX News

Ex-IBM CEO Sam Palmisano and Cyber Readiness Institute Managing Director Kiersten Todt join'Special Report' with analysis WASHINGTON โ€“ The Trump administration has notified Congress that it intends to shutter the last two remaining U.S. consulates in Russia. The State Department told lawmakers last week that it would permanently close the consulate in the far eastern Russian city of Vladivostok and temporarily suspend operations at the consulate in Yekaterinburg just east of the Ural Mountains. The notice was sent to Congress on Dec. 10 but received little attention at the time. That timing predates by three days the public emergence of news about a major suspected Russian computer intrusion into U.S. government and private computer systems that has raised grave cybersecurity fears. Russian President Vladimir Putin gestures as he speaks via video call during a news conference as journalists wearing face masks to protect against coronavirus, observe social distancing guidelines watch him at a big screen in Moscow, Russia, Thursday, Dec. 17, 2020.


Establishing partnerships in AI, Robotics & Skilling at the 5th Eastern Economic Forum in Russia - Times of India

#artificialintelligence

Prime Minister Narendra Modi recently attended the 5th Eastern Economic Forum (EEF) in Vladivostok, Russia. PM Modi was invited as the chief guest of the forum by Mr.Vladimir Putin, the President of Russia. Speaking at the forum, PM Modi announced that India would extend a $1 billion line of credit towards the development of the Russian Far East. He reminded that this move was in line with India's policy of'Act East'. Along with PM Modi, a delegation of business leaders from India attended the forum.


Prediction of Porosity and Permeability Alteration based on Machine Learning Algorithms

arXiv.org Machine Learning

The objective of this work is to study the applicability of various Machine Learning algorithms for prediction of some rock properties which geoscientists usually define due to special lab analysis. We demonstrate that these special properties can be predicted only basing on routine core analysis (RCA) data. To validate the approach core samples from the reservoir with soluble rock matrix components (salts) were tested within 100+ laboratory experiments. The challenge of the experiments was to characterize the rate of salts in cores and alteration of porosity and permeability after reservoir desalination due to drilling mud or water injection. For these three measured characteristics, we developed the relevant predictive models, which were based on the results of RCA and data on coring depth and top and bottom depths of productive horizons. To select the most accurate Machine Learning algorithm a comparative analysis has been performed. It was shown that different algorithms work better in different models. However, two hidden layers Neural network has demonstrated the best predictive ability and generalizability for all three rock characteristics jointly. The other algorithms, such as Support Vector Machine and Linear Regression, also worked well on the dataset, but in particular cases. Overall, the applied approach allows predicting the alteration of porosity and permeability during desalination in porous rocks and also evaluating salt concentration without direct measurements in a laboratory. This work also shows that developed approaches could be applied for prediction of other rock properties (residual brine and oil saturations, relative permeability, capillary pressure, and others), which laboratory measurements are time-consuming and expensive.


Russia's new mail delivery drone crashes into wall during inaugural flight

The Independent - Tech

A postal drone in Russia crashed into a wall and smashed into pieces during its maiden flight. The unmanned aerial vehicle took off to deliver a small package to a village near Ulan-Ude, a city in Siberia, but hit a three-storey building shortly after lifting off from a mini launch pad in front of a crowd of spectators. The drone had been touted as a new way to deliver post in the rural Buryatia region, located more than 2,700 miles from the Russian capital Moscow. Video footage of the crash showed the vehicle taking off before veering into the apartment building and showering onlookers with debris. No one was harmed in the incident.


Russian postal drone crashes into a wall at top speed on its maiden flight in Ulan-Ude Siberia

Daily Mail - Science & tech

A Russian-made drone on its way to making its first parcel delivery has crashed into a wall just moments after taking off. The smash shocked local residents and regional officials who had gathered in the Siberian city of Ulan-Ude on Monday to watch the drone's maiden flight. The drone was sent to deliver a small package to a neighbouring village in the sparsely populated Buryatia region, more than 4,400km east of Moscow. Video footage showed the drone lifting off from a miniature launch pad bearing Russian Post's blue and white logo. A small crowd of spectators, present for the ceremony intended to showcase a new way to deliver mail in the region, were heard uttering expletives after the crash.